Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming

نویسندگان

Changkyu Choi

Donggeon Kong

Hyoung-Ki Lee

Sang Min Yoon

چکیده

Speaker segmentation is an important task in multi-party conversations. Overlapping speech poses a serious problem in segmenting audio into speaker turns. We propose an audio-visual speech separation system consisting of an array microphone with eight sensors and an omnidirectional color camera. Multiple concurrent speeches are segmented by fusing the two heterogeneous sensors. Each segmented speech is further enhanced by a linearly constrained minimum variance beamformer. Regardless of co-existing wide-band sound sources and pictures of human in a reverberant environment the proposed system effectively separates multiple target speeches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-Time Speaker Localization and Speech Separation by Audio-Visual Integration

| Robot audition in real-world should cope with motor and other noises caused by the robot's own movements in addition to environmental noises and reverberation. This paper reports how auditory processing is improved by audio-visual integration with active movements. The key idea resides in hierarchical integration of auditory and visual streams to disambiguate auditory or visual processing. Th...

متن کامل

Improvement of three simultaneous speech recognition by using AV integration and scattering theory for humanoid

This paper presents improvement of recognition of three simultaneous speeches for a humanoid robot with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech are difficult, because the number of simultaneous talkers exceeds that of its microphones, the signal-to-noise ratio is quite low (around -3 dB) and noise is not stable d...

متن کامل

Using audio and visual information for single channel speaker separation

This work proposes a method to exploit both audio and visual speech information to extract a target speaker from a mixture of competing speakers. The work begins by taking an effective audio-only method of speaker separation, namely the soft mask method, and modifying its operation to allow visual speech information to improve the separation process. The audio input is taken from a single chann...

متن کامل

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed ti...

متن کامل

A crack localization method for beams via an efficient static data based indicator

In this paper, a crack localization method for Euler-Bernoulli beams via an efficient static data based indicator is proposed. The crack in beams is simulated here using a triangular variation in the stiffness. Static responses of a beam are obtained by the finite element modeling. In order to reduce the computational cost of damage detection method, the beam deflection is fitted through a poly...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Separation of multiple concurrent speeches using audio-visual speaker localization and minimum variance beam-forming

نویسندگان

چکیده

منابع مشابه

Real-Time Speaker Localization and Speech Separation by Audio-Visual Integration

Improvement of three simultaneous speech recognition by using AV integration and scattering theory for humanoid

Using audio and visual information for single channel speaker separation

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

A crack localization method for beams via an efficient static data based indicator

عنوان ژورنال:

اشتراک گذاری